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Abstract 

Protein-DNA interactions are crucial for all biological processes. One of the most important 
fundamental aspects of these interactions is the process of protein searching and recognizing specific 
binding sites on DNA. A large number of experimental and theoretical investigations have been 
devoted to uncovering the molecular description of these phenomena, but many aspects of the 
mechanisms of protein search for the targets on DNA remain not well understood. One of the most 
intriguing problems is the role of multiple targets in protein search dynamics. Using a recently 
developed theoretical framework we analyze this question in detail. Our method is based on a 
discrete-state stochastic approach that takes into account most relevant physical-chemical processes 
and leads to fully analytical description of all dynamic properties. Specifically, systems with two 
and three targets have been explicitly investigated. It is found that multiple targets in most cases 
accelerate the search in comparison with a single target situation. However, the acceleration is 
not always proportional to the number of targets. Surprisingly, there are even situations when it 
takes longer to find one of the multiple targets in comparison with the single target. It depends 
on the spatial position of the targets, distances between them, average scanning lengths of protein 
molecules on DNA, and the total DNA lengths. Physical-chemical explanations of observed results 
are presented. Our predictions are compared with experimental observations as well as with results 
from a continuum theory for the protein search. Extensive Monte Carlo computer simulations fully 
support our theoretical calculations. 
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I. INTRODUCTION 


Major cellular activities are effectively governed by multiple protein-DNA interactions 
lj-3]. The starting point of these interactions is a process of protein searching and rec¬ 
ognizing for the specific binding sites on DNA. This is a critically important step because 
it allows a genetic information contained in DNA to be effectively transferred by initiating 
various biological processes [l 3]. In recent years, the fundamental processes associated with 
the protein search for targets on DNA have been studied extensively using a wide variety 
of experimental and theoretical methods ( 4 -34]. Although a significant progress in our un¬ 
derstanding of the protein search phenomena has been achieved, the full description of the 


mechanisms remains a controversial and highly-debated research topic 


16 


24 


26 


30H32]. 


Experimental investigations of the protein search phenomena revealed that many proteins 
find their targets on DNA very fast, and the corresponding association rates might exceed 


the estimates from 3D diffusion limits 


fils, 


16 


24]. These surprising phenomena are known 


as a facilitated diffusion in the protein search field. More recent single-molecule experiments, 
which can directly visualize the dynamics of individual molecules, also suggest that during 


the search proteins move not only through the bulk solution via 3D diffusion 


bind non-specifically to DNA where they hop in ID fashion [8, 9. |12. [131.119l. 


Dut they also 


20 


29]. Several 


theoretical approaches that incorporate the coupling 
the protein search have been proposed 0, y y , y , 


De 


ween 3D diffusion and ID sliding in 
32], but they had a variable success in 


explaining all experimental observations. 


One of the most interesting problems related to the protein search on DNA is the effect 
of multiple targets. The question is how long will it take for the proteins to find any specific 
binding site from several targets present on DNA. Naively, one could argue that in this case 
the search time should be accelerated proportionally to the number of targets, i.e., the as¬ 
sociation reaction rate should be proportional to the concentration of specific binding sites. 
However, this effectively mean-field view ignores several important observations. First, it is 
clear that the search time for several targets lying very close to each other generally should 
not be the same as the search time for the same number of targets which are spatially dis¬ 
persed. Second, the experimentally supported complex 3D+1D search mechanism suggests 
that varying spatial distributions of the specific binding sites should also affect the search 
dynamics. Thus, it seems that the simple mean-field arguments should not be valid for 
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all conditions. Surprisingly, this very important problem was addressed only in one recent 


work 


at Hammar et al. Q u Smg M gh - q ualit y smgl e- mol ec ul e m e_ t3 in thn >iv- 


ing cells investigated the dynamics of hireling the specific sites for lac repressor proteins on 
DNA with two targets. It was found that the association rates increase as a function of 
the distance between targets 20]. An approximate theoretical model for the protein search 
with two targets was proposed. However, this theoretical approach has several problems. 
It was presented for infinitely long DNA chains using a continuum approximation. At the 
same time, it was shown recently that the continuum approach might lead to serious errors 


and artifacts in the description of protein search dynamics 


32| . In addition, this theory 


predicted that the acceleration due to the presence of two targets in comparison with the 
case of only one target should disappear in the limit of very large sliding lengths. This is 
clearly a nonphysical result. In this limit, the protein spends most of the searching time on 
DNA and it is faster to find any of two targets than one specific binding site. 


In this article, we present a comprehensive theoretical method of analyzing the role of 
multiple targets in the protein search on DNA. Our approach is based on a discrete-state 


stochastic framework t 


rat was recently developed by one of us for the search with one 


specific binding site 32], It takes into account most relevant biochemical and biophysical 


processes, and it allows us to obtain fully analytical solutions for all dynamic properties 
at all conditions. One of the main results of the discrete-state stochastic method was a 
construction of dynamic phase diagram 32]. Three possible dynamic search regimes were 


identified. When the protein sliding length was larger than the DNA chain length, the search 
followed simple random-walk dynamics with a quadratic scaling of the search time on the 
DNA length. For the sliding length smaller than the DNA length but larger than the the size 
of the specific binding site, the search dynamic followed a linear scaling. When the sliding 
length was smaller than the target size, the search was dominated by nonspecific bindings 
and unbindings without the sliding along DNA. In this paper, we extend this method to the 
case of several specific binding sites at arbitrary spatial positions. It allows us to explicitly 
describe the role of multiple targets and their spatial distributions in the protein search. 
Our theoretical calculations agree with available experimental observations, and we also 
test them in Monte Carlo computer simulations. 
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II. THEORETICAL METHOD 


The original discrete-state stochastic approach can be generalized for any number of the 
specific binding sites at arbitrary positions along the DNA chain. But to explain the main 
features of our theoretical method, we analyze specifically a simpler model with only two 
targets as shown in Fig. 1. A single DNA molecule with L binding sites and a single 
protein molecule are considered. The analysis can be easily extended for any concentration 
of proteins and DNA 25]. Two of the bindings sites i and j (i = m\ and j = m 2 ) are targets 
for the protein search (see Fig. 1). The protein starts from the bulk solution that we label 
as a state 0. Since 3D diffusion is usually much faster than other processes in the system, we 
assume that the protein can access with equal probability any site on the DNA chain (with 
the corresponding total binding rate k on ). While being on DNA, the protein can move with 
a diffusion rate u along the chain with equal probability in both directions. The protein 
molecule can also dissociate from DNA with a rate k Q ff to the bulk solution (Fig. 1). The 
search process ends when the protein reaches for the first time any of two targets. 

The main idea of our approach is to utilize first-passage processes to describe the complex 
dynamics of the protein search on DNA 32]. One can introduce a function F n {t ) defined as 
a probability to reach any target at time t for the first time if initially (at t = 0) the protein 
molecule starts at the state n [n — 0,1,..., L). These first-pa ssag e probabilities evolve with 


time as described by a set of the backward master equations 


23 


32], 


dFnjt) 

dt 


— u[F n+ i(t) + F n _i(t )] + k o ffF 0 (t ) — (2 u + k 0 ff)F n (t ), 


( 1 ) 


for 2 < n < L — 1. At DNA ends (n = 1 and n = L) the dynamics is slightly different, 


dF x {t) 

dt 


= uF 2 (t ) + k o f f F 0 (t) - (u + k of f)Fi{t)] 


( 2 ) 


and 


M 

dt 


= uF L _i(t) + k off F 0 (t ) - {u + k off )F L (t). 


( 3 ) 


In addition, in the bulk solution we have 


dFpjt ) _ K 
dt L 


YF n (t) - k on F 0 (t). 


( 4 ) 


71=1 
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Furthermore, the initial conditions require that 


F mi (t) = F m2 (t ) =5(t), F n ^ mum2 {t = 0) = 0. 


(5) 


The physical meaning of this statement is that if we start at one of two targets the search 
process is finished immediately. 

It is convenient to solve Eqs. ©, ©, © and dU) by employin g La place transformations 
of the first-passage probability functions, F n (s) = / 0 °° e~ st F n {t)dt 32]. The details of calcu¬ 
lations are given in the Appendix. It is important to note here that the explicit expressions 
for the first-passage probability distribution functions in the Laplace form provide us with 
a full dynamic description of the protein search 32], For example, the mean first-passage 


time to reach any of the target sites if the original position of the protein was in the solution 
(n = 0), which we also associate with the search time, can be directly calculated from 32] 


rr - dF ^ S )\ 

=-X- s=0- 

OS 


( 6 ) 


As shown in the Appendix, the average search time is given by 


T n = 


koffL + k on [L — Si(0)] 

kon koff ( 0 ) 


(7) 


where Si(s) is a new auxiliary function with a subscript specifying the number of targets 
(i — 2 for the system with two targets). For this function we have 

_|_ y ) |2(1 — y2L+mi—m2'j ^ _|_ ^1+2(L—m2)^j 

1 i ,9m i — 1 \ /1 i „ .14-9 ( T, — mriW {'1 I „.mo —^ 


(1 -?/)(! + y2mi- 1 )(l + y 1+2(L—m 2 )) ^ 1 + ym 2 -m i) 


with 


y = 


s + 2u + koff - y/(s + 2u + koff) 2 - 4-u 2 


2 u 


(9) 


It is important that for m i = m 2 , as expected, our results reduce to expressions for the 
protein search on DNA with only one target 32]. Similar procedures can be used to estimate 
all other dynamic properties for the system with two targets. 

We can extend this approach for any number of targets and for any spatial distribution 
of binding sites. This is discussed in detail in the Appendix. Surprisingly, the expression 
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for the search times are the same in all cases but with different Si functions that depend on 
the number of specific binding sites and their spatial distributions. Analytical results for Si 
for the protein search on DNA with three or four targets, as well as a general procedure for 
arbitrary number of specific binding sites, are also presented in the Appendix. 


III. RESULTS AND DISCUSSION 


A. Spatial Distribution of Targets 


Because our theoretical method provides explicit formulas for all relevant quantities, it 
allows us to fully explore many aspects of the protein search mechanisms. The first problem 
that we can address is related to the role of the spatial distribution of targets on the search 
dynamics. In other words, the question is how the search time is influenced by exact positions 
of all targets along the DNA. The results of our calculations for two specific binding sites 
are presented in Fig. 2. The longest search times are found when two targets are at different 
ends of the molecule, and the distance between them along the DNA curve, l = \mi — m 2 1, 
is the largest possible and equal to L — 1. The search is faster if targets are moved closer to 
each other and both distributed symmetrically with respect to the middle point of the DNA 
molecule (Fig. 2). Moving the targets too close (l ~ 0) starts to increase the search time 
again: see Fig. 2. For short DNA chains, it can be shown that there is an optimal distance 
between two targets, l opt — L/2, that yields the fastest search (Fig. 2). It corresponds to 
the most optimal positions of the specific sites to be at mi = L/4 and m 2 = 3L/4. 

The last result is slightly unexpected since simple symmetry arguments suggest that 
the fastest search would be observed for the uniform distribution of targets, i.e. when the 
distance between the specific sites and the distance between the ends and targets are the 
same, i.e., for m\ = L /3 and m 2 = 2L/3. This is not observed in Fig. 2. To explain 
this, one can argue that the search on the DNA molecule of length L with n targets can be 
mapped into the search on n DNA segments of variable lengths with only one target per each 
segment. In this case, positioning each target in the middle of the corresponding segment 
leads to the fastest search dynamics. 32] This suggests that the most optimal distribution of 
n symmetrically distributed targets is a uniform distribution with the distance between two 
neighboring targets equal to L/n. But then the first and the last targets will be separated 
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from the corresponding ends by a shorter distance - 111 " = A. This is exactly what we 
see in Fig. 2 for n = 2 targets. The reason that distances between the ends and the closest 
targets deviate from the distances between the targets is the reflecting boundary conditions 
at the ends that are assumed in our model: see Eqs. ([2]) and (J3]) . 

The results presented in Fig. 2 also illustrate another interesting observation. Increas¬ 
ing the length of DNA effectively eliminates the minimum in the search time for specific 
symmetric locations of the targets. Essentially, for L 1, which is much closer to realistic 
conditions in most cases, any two position of the targets inside the DNA chain will be opti¬ 
mal and will have the same search time as long as they are not at the ends. We will discuss 
the reason for this below. 


B. Dynamic Phase Diagram 


One of the main advantages of our method is the ability to explicitly analyze the search 
dynamics for all ranges of relevant parameters. This allows us to construct a comprehensive 
dynamic phase diagram that delineates different search regimes. The results are presented 
in Fig. 3 for the systems with different numbers of specific binding sites. The important 
observation is that general features of the search behavior are independent of the number of 
targets. 

More specifically, there are three dynamic phases that depend on the relative values of 
the length of DNA L, the average scanning length A = \Juf k Q ff and the size of the target 
(taken to be equal to unity in our model). For A > L the random-walk regime is observed 
with the search time being quadraticaly proportional to the size of DNA 32]. In this case, 
the protein non-specihcally binds DNA and it does not dissociate until it finds one of the 
targets. The quadratic scaling is a result of a simple random-walk unbiased diffusion of the 
protein molecule on DNA during the search. For the intermediate sliding regime, 1 < A < L, 
the protein binds to DNA, scans it, unbinds and repeats this cycle at average L/nX times 
(n is number of the targets) for symmetrically distributed specific sites. For more general 
distributions the number of search cycles is also proportional to L/X. This leads to the linear 
scaling in the search times. For A < 1 we have the jumping regime where the protein can 
bind to any site on DNA and dissociate from it, but it cannot slide along the DNA chain. 
The search time is again proportional to L because on average the protein must check L 
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sites. These changes in the dynamic search behavior are illustrated in Fig. 4, in which the 
search times as a function of the DNA lengths are presented for different scanning lengths. 
The slope variation indicates a change in the scaling behavior in the search times from L 2 
to L as the DNA length increases for fixed A. 

It is also important to note here that the concept of the most optimal positions of targets 
is not working for the sliding regime (1 < A < L) because the protein during the search 
frequently unbinds from the DNA. losing all memory about what it already scanned. This 
concept also cannot be defined in the jumping regime where the protein does not slide at 
all. From this point of view, any position of the targets are equivalent. The only two 
positions that differ from others are the end sites in the sliding regime. This is because they 
can be reached only via one neighboring site, while all other sites can be reached via two 
neighboring sites (see Fig. 1). 


Acceleration of the Search 


The most interesting question for this system is to analyze quantitatively the effect of 
multiple targets on search dynamics. To quantify this we define a new function, a n , which 
we call an acceleration, 


Clr. 


To(l) 


( 10 ) 


T„(n)' 

This is a ratio of the search times for the case of one target and for the case of n targets. 
The parameter a n gives a numerical value of how the presence of multiple targets increases 
the rate of association to any specific binding site. The results for acceleration are presented 
in Figs. 5-7. 

First, we analyze the situation when targets in all cases are in the most optimal symmetric 
positions, which is shown in Fig. 5. For DNA with the single target it is in the middle of the 
chain, while for DNA with n targets they are distributed uniformly, as we discussed above, 
with the distance L/n between the internal targets and L/2n for boundary targets and 
DNA ends. The acceleration for these conditions depends on the dynamic search regimes, 
and it ranges from n to n 2 : see Fig. 5. For the case of A < L (jumping and sliding 
regimes), on average the number of search excursions to DNA before finding the specific 
site is equal to L/n , and this leads to a linear behavior in the acceleration (a n ~ n). For 
A > L (random-walk regime), the search is one-dimensional and the protein must diffuse on 



average the distance L/n before it can find any of the targets. The quadratic scaling for 
the simple random walk naturally explains the acceleration in the search in this dynamic 
regime, a n ~ n 2 . 

ffowever, the acceleration is also affected by the distance between the targets. If we 
maintain the most optimal conditions for the DNA with one target but vary the distance 
between multiple targets, while keeping the overall symmetry, the results are shown in Fig. 
6 . In this case, putting targets too close to each other or moving them apart lowers the 
acceleration. Eventually, there will be no acceleration for these conditions (a n = 1). But the 
results are much more interesting if we consider the non-symmetric distributions of targets. 
Surprisingly, the search time for the system with multiple targets can be even slower than 
for the single target system! This is shown in Fig. 7 where a n can be as low as 1/4 for 
the two-target system in the random-walk regime, or it can reach the value of 1/2 in the 
sliding and jumping regimes (not shown). The single target in the most optimal position in 
the middle of the DNA chain can be found much faster in comparison with the case of two 
targets seating near one of the ends. 

These observations suggest that the degree of acceleration of the search process due 
to the presence multiple targets is not always a linear function of the number of specific 
binding sites. It depends on the nature of the dynamic search phase, the distance between 
the targets and the spatial distribution of the targets. Varying these parameters can lead 
to larger accelerations as well as to unexpected decelerations. It is a consequence of the 
complex mechanism of the protein search for targets on DNA that combines 3D and ID 
motions. This is the main result of our paper. 


D. Comparison with Continuum Model and with Experiments 


Recently, single-molecule experiments measured the 
proteins on DNA with two identical specific binding sites 


facilitated search of lac repressor 


20]. These experiments show that 


the association rate increases before reaching the saturation with the increase in the distance 
between the targets. Our theoretical model successfully describes these measurements, as 
shown in Fig. 8. Fitting these data, we estimate the ID diffusion rate for the lac repressors 


as u ~ 7 x 10 5 s 1 , which is consistent with in vitro measured values jisj]. Our estimates for 
the sliding length, A ~ 25 bp, and for the non-specific association to DNA, k on ~ 6.4 x 10 4 
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s x , also agree with experimental observations 20]. 

It is important to compare our results with predictions from the theoretical model pre¬ 


sented in Ref 


20]. This continuum model was developed assuming that the length of DNA 


is extremely long, L 1. It was shown that the acceleration for the search for the case of 
two targets can be simply written as 20] 


. I 

a 2 = 1 + tanh ( — 
2 A 


( 11 ) 


where l is the distance between targets. The comparison between two theoretical approaches 
is given in Figs. 9 and 10. One can see from Fig. 9 that both models agree for very large 
DNA lengths, L 1, while for shorter DNA chains there are significant deviations. The 
continuum theory H predicts that the acceleration is always a linear or sub-linear function 
of the number of targets, i.e., 1 < a n < n. Our model shows that the acceleration can 
have a non-linear dependence on the number of the targets, a n ~ n 2 . More specifically, this 
can be seen in Fig. 10, where the acceleration is presented as a function of the scanning 
length A. The prediction of the continuum theory that for A 1 the acceleration always 
approaches the unity is unphysical. Clearly, if we consider, e.g., the optimal distribution of 
targets, then the larger the number of specific binding sites, the shorter the search time. The 
reason for the failure of the continuum model at this limit is its inability to properly account 


for all dynamic search regimes. This analysis shows that the continuum model 


20 ] has a 


very limited application, while our theoretical approach is consistent with all experimental 
observations and provides a valid physical picture for all conditions. 


IV. SUMMARY AND CONCLUSIONS 

We investigated theoretically the effect of the multiple targets in the protein search for 
specific binding sites on DNA. This was done by extending and generalizing the discrete- 
state stochastic method, originally developed for single targets, that explicitly takes into 
account the most important biochemical and biophysical processes. Using the first-passage 
processes, all dynamic properties of the system can be directly evaluated. It was found that 
the search dynamics is affected by the spatial distribution of the targets for not very long 
DNA chains. There are optimal positions for specific sites for which the search times are 
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minimal. We argued that this optimal distribution is almost uniform with a correction due 
to the DNA chain ends. We also constructed a dynamic phase diagram for the different 
search regimes. It was shown that for any number of targets there are always three phases, 
which are determined by comparing the DNA length, the scanning length and the size of 
the target. Furthermore, we investigated the quantitative acceleration in the search due 
to the presence of multiple targets for various sets of conditions. It was found also that 
the acceleration is linearly proportional to the number of targets when the scanning length 
is less than the DNA length. For larger scanning lengths, the acceleration becomes faster 
with the quadratic dependence on the number of targets. However, changing the distances 
between the targets generally decreases the effect of acceleration. Unexpectedly, we found 
that varying also the spatial distributions can reverse the behavior: it might take longer 
to find the specific site in the system with multiple targets in comparison with properly 
positioned single target. Our model allows us to explain this complex behavior using simple 
physical-chemical arguments. In addition, we applied our theoretical analysis for describing 
experimental data, and it is shown that the obtained dynamic parameters are consistent 
with measured experimental quantities. A comparison between our discrete-state theoretical 
method the continuum model is also presented. We show that the continuum model has 
a limited range of applicability, and it produces the unphysical behavior at some limiting 
cases. At the same time, our approach is fully consistent at all sets of parameters. Our 
theoretical predictions were also fully validated with Monte Carlo computer simulations. 


The presented theoretical model seems to be successful in explaining the complex pro¬ 
tein search dynamics in the systems with multiple targets. One of the main advantage of 
the method is the ability to have a fully analytical description for all dynamic properties 
in the system. However, one should remember that this approach is still quite oversimpli¬ 
fied, and it neglects many realistic features of the protein-DNA interactions. For example, 
DNA molecule is assumed to be frozen, different protein conformations that are observed 
in experiments are not taken into account, and the possibility of correlations between 3D 
and ID motions is also not considered. It will be critically important to test the presented 
theoretical ideas in experiments as well as in more advanced theoretical methods. 
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APPENDIX: EXPLICIT CALCULATIONS OF FIRST-PASSAGE PROBABIL¬ 
ITY FUNCTIONS AND AVERAGE SEARCH TIMES 

This appendix includes detailed derivations of the equations from the main text and 
explicit expressions for functions utilized in our calculations. 

To solve the backward master equations (l)-(5) for the system with two targets we use 
the Laplace transformation which leads to 

+ k o ffF 0 (s); (12) 

(s + u + k of f)Fi(s) = uF 2 (s) + k off F 0 (s ); (13) 

(s + U + k off )F L (s ) = mF l _i(s) + k off F 0 (s ); (14) 

~ k L ~ 

(s + k on )F 0 (s) = ~y~ F n (s); (15) 

72=1 

with the condition that 

F mi (s) = F m2 (s) = 1. (16) 

We are looking for the solution of these equations in the form, F n (s) = A-y n +B } where A 
and B are unknown coefficients that will be determined after the substitution of the solution 
into Eqs. (fT2D . (ITS]) . flTjj) and (fT51) . This gives the following expression, 

(s + 2u + k 0 ff)(Ay n + B) = u [ Ay n+1 + B + Ay 11 1 + + k o ffF 0 (s). (17) 

After rearranging, we obtain 

A [uy n+1 - (s + 2u + k Q ff)y n + uy n_1 ] = (s + k 0 f/)B - k off F 0 (s). (18) 


(s + 2u + k off )F n (s ) = u F n+1 (s) + F’n-i(s) 
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Requiring that the right-hand-side of this expression to be equal to zero, yields 


B 


kpff 

s + k off 


Fo(s) 


Since the parameter d ^ 0, we can find y by solving 


( 19 ) 


uy n+1 - (s + 2u + k 0 ff)y n + uy n 1 = 0, 


( 20 ) 


or 


uy 2 — (s + 2u + k Q ff)y + u = 0. 


( 21 ) 


There are two roots of this quadratic equation, 


V i = 


s + 2u + k Q ff - a/(s + 2u + k Q ff) 2 - 4 u 2 


2 u 


( 22 ) 


and 


with y 2 = l/ Vl . 


V2 = 


s + 2 u + k Q fj + \/(s + 2u + k 0 ff ) 2 - 4w 2 


2u 


(23) 


The next step is to notice that two targets at the positions nri\ and m 2 divide the DNA 
chain into 3 segments which can be analyzed separately. Then the general solution should 
have the form 

Fn(s) = A lV n + A 2 y~ n + B, (24) 

with the parameter B is specified by Eq. (1191) and y — yi- Using the corresponding boundary 
conditions, it can be shown that for 1 < n < m 1 


~ ( i -w± j ry) +Bi 

v J ymi _|_ yl—mi 


while for m 1 < n < m 2 we have 


7T, ^ (1 -B)(y n + y m '+ m *- n ) , D 

“ - y m\ + ym* - + B ’ 


(25) 


( 26 ) 


13 









and for m 2 <n<L 




ym, 2 —L _j_ yl+L—ni 2 

This leads to the following expression for Fq(s): 


(27) 


FJs) = 


kon (koff + s)Si(s) 


Ls(s H - k on ^o//) k 0 nk 0 ffSi(s) 


(28) 


where the auxiliary function S'j(s) is introduced via the following relation 


L 

Y t Fn(s) = (l-B)S i (s) + BL. 

n=1 

Note that Eq. fl28|l is identical to the corresponding equation for the single-target case 
but with the different auxiliary function S'j(s). 


(29) 


32|, 


Finally, we can obtain the explicit expressions for the search times as given in the main 
text in Eq. (7). The explicit form of the search time depends on the auxiliary functions Si, 
which can be directly evaluated. For example, for the two targets we have 


S2{S) 


7711 — 1 

E 


77,— 1 


y n + y x ~ n 

yTYl\ _|_ y 1 777,1 


7712 1 - ,77, | 777,1+777,2 — 77, 


r + y 


77—777,1 


y mi + y 


7772 


L 

77 — 777,2 


y n ~ L _j_ yl+L-n 
yrri 2 —L _|_ yl+L—m 2 ’ 


(30) 


which after simplifications leads to Eq. (8) in the main text. Similar analysis can be done 
for any number of targets with arbitrary positions along the chain. The final expression for 
the search times is the same in all cases [given by the Eq.(7)], but with the different auxiliary 
functions Si(s). When the protein molecule searches the DNA with three targets (i = 3), it 
can be shown that 


Sz{s) = 


y -1 


y 2 -\- 2 L _ y 2 ms 


y2 _ y2mi 


y\-\-2L _|_ y2vrL2, y _j_ y2vfl\ 


(! + ?/) 


yl 771 _ ym2 

ylTl\ _|_ yTYl2 


+ 


yTTl 2 _ yiris 
ym 2 _|_ yms 


(31) 


For the system with four targets (i = 4) we obtain 
1 


SAs) = 


y 


y 2 + 2 L _ y 2 m .4 


yl+2L _|_ y2m,4 y _|_ y2m\ 


y 2 _ y 2 m\ (y mi — y m2 y m2 — y m3 y m3 — y m4 

(1 + y) I :—— + 


ymi _|_ yTTL2 y777,2 _j_ y7773 y7773 _j_ y7774 

(32) 
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Fig. 1. A general view of the discrete-state stochastic model for the protein search on DNA 
with two targets. There are L — 2 nonspecific and 2 specific binding sites on the DNA chain. 
A protein molecules can diffuse along the DNA with the rate u, or it might dissociate into 
the solution with the rate k Q ff. From the solution, the protein can attach to any position 
on DNA with the total rate k on . The search process is considered to be completed when the 
protein binds for the first time to any of two targets at the position m\ or m 2 . 

Fig. 2. Normalized search times as a function of the normalized distance between two 
targets. The targets are positioned symmetrically with respect to the center of the DNA 
chain. The parameters used in calculations are the following: u = k on = 10 5 s -1 and 
k 0 ff = 10 3 s -1 . The scanning length A is varied by changing k 0 ff. Solid curves are theoretical 
predictions, symbols are from Monte Carlo computer simulations. 

Fig. 3. Dynamic phase diagram for the protein search with multiple targets. Search times 
as a function of the scanning length are shown for systems with one, two or three targets. 
The parameters used in calculations are the following: L = 10001 bp; and u = k on = 10 5 
s _1 . The scanning length A is varied by changing k a ff. 

Fig. 4. Protein search times as a function of DNA length for different scanning lengths 
for the system with two targets. The parameters used in calculations are the following: 
u = k on = 10 5 s' 1 . Solid curves are theoretical predictions, symbols are from Monte Carlo 
computer simulations. The scanning length A is varied by changing k Q ff- 

Fig. 5. Acceleration in the search times as a function of the scanning length for the systems 
with two and three targets. The parameters used in calculations are the following: u = 
k on = 10 5 s' 1 . The scanning length A is varied by changing k a ff. 

Fig. 6. Acceleration in the search times as a function of the normalized distance between 
the targets for the systems with two and three targets. The single target is in the middle 
of the DNA chain. Other targets systems are symmetric but not optimal. The parameters 
used in calculations are the following: u = k on = 10 6 s' 1 ; k a ff = 10' 4 s' 1 and L = 10 5 bp. 

Fig. 7. Acceleration in the search times as a function of the normalized distance between 
the targets for the systems with targets. The single target is in the middle of the DNA 
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chain. In the two-target system one of the specific binding sites is fixed at the end and the 
position of the second one is varied. The parameters used in calculations are the following: 
u = k on = 10 5 s -1 ; k„ff = 10 -4 s^ 1 and L = 10 5 . 


Fig. 8. Describing the experimental data from Ref. 
from the best fit are discussed in the text. 


20] using Eq. (7). Parameters obtained 


Fig. 9. Comparison of theoretical predictions for the acceleration as a function of the 
distance between the specific binding sites for the system with two targets for different DNA 
lengths. Targets are distributed symmetrically with respect to the middle of the DNA chain. 
Solid curves are discrete-state predictions, dashed curves are from the continuum model from 
Ref. 20]. The parameters used in calculations are the following: u = k on = 10 5 s _1 ; and 
k a ff = 10 3 s _1 . 


Fig. 10. Comparison of theoretical predictions for the acceleration as a function of the 
scanning length for the system with two targets for different DNA lengths. Targets are in 
the most optimal symmetric positions. Solid curves are discrete-state predictions, dashed 
curves are from the continuum model from Ref. Q], The parameters used in calculations 
are the following: u = k on = 10 5 s . The scanning length A is varied by changing k a ff. 
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Figure 2. Lange, Kochugaeva and Kolomeisky 
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Figure 3. Lange, Kochugaeva and Kolomeisky 
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Figure 5. Lange, Kochugaeva and Kolomeisky 
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Figure 6. Lange, Kochugaeva and Kolomeisky 
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Figure 7. Lange, Kochugaeva and Kolomeisky 

FIG. 7 


25 




Figure 8. Lange, Kochugaeva and Kolomeisky 
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